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. Abstract 

In this article we propose a quantum version of Shannon's conditional entropy. 
Given two density matrices p and a on a finite dimensional Hilbert space and with 
,_H ' S{p) = — Trplnp being the usual von Neumann entropy, this quantity S(p\a) is 

| concave in p and satisfies < S(p\a) < S(p), a quantum analogue of Shannon's 

famous inequality Thus we view S(p\a) as the entropy of p conditioned by a. The 
"^J | second inequality is an equality if a is a multiple of the identity. In contrast to the 

■ classical case, however, S(p\p) — if and only if the non-vanishing eigenvalues of p 
| are all non-degenerate. Also in general and again in contrast to the corresponding 

classical situation S(p, a) = S(a) + S(p\a) is not symmetric in p and a even if they 
commute. We also show that there is no quantum version of conditional entropy 

■ in terms of two density matrices, which shares more properties with the classical 

case and which in particular reduces to the classical case when the two density 

+Jj ■ matrices commute. As an alternative we propose to use spectral resolutions of the 

unit matrix instead of density matrices. We briefly compare this with the algebraic 
approach of Connes and St0rmer and Connes, Narnhofer and Thirring. 

FT 

1 Introduction 

X' 

The concept of entropy plays a major role in thermodynamics and statistical mechanics. 
It serves to describe the behavior of macroscopic systems. The name "entropy" was 
introduced by Clausius (1865) and derives from evrpoiur] "transformation". It was von 
Neumann (1927 [jR]]), who generalized the classical expression of Boltzmann and Gibbs 
for the entropy to quantum mechanics by using the concept of what is now called a 
density matrix, also introduced quite generally by him in the same year @. In the 
special context of radiation damping the density matrix was discovered independently 
by L. Landau [|l0| and by F. Bloch Q, again in the same year (see also the citation in 
H). For a technical overview of the developments up to 1978 and with further historical 
references see For recent expositions see [|^, In the theory of dynamical 

systems entropy and the derived notion of topological entropy also plays an important 
role, see e.g. the contributions in Q . 

In a seminal article Shannon (1948, |2^]) introduced the concept of entropy into 
information theory. Roughly speaking a gain in information means a decrease in en- 
tropy. Shannon also provided the concept of conditional entropy. It is a measure how 
entropy is reduced given a preexisting knowledge. To the author's best knowledge the 
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first construction in quantum mechanics coming close to such a notion is due to E. Lieb 
Jl2[ (see also [ 26], [l^] ) . It involves tensor product structures and it was called a relative 
entropy in 112] (but a conditional entropy in [pfifl, p. 259). In view of recent develop- 
ments in quantum computation and quantum coding (see p0|, 18] for a concise account) 



it is highly desirable to have such a quantity at ones disposal. There is a construction of 
a non-commutative analogue of Shannon's conditional entropy by Connes and St0rmer 
H and Connes, Narnhofer and Thirring ||(for an exposition and a discussion of further 
developments see e.g. |lf|). More recently attempts have been made to construct a 
mutual information analogous to Shannon's conditional entropy in the context of quan- 
tum error-correction. In two of these attempts [21, 13], made independently, yielded the 
same quantity. The first article exhibits necessary and sufficient conditions for quan- 
tum error-correction to be possible in terms of the mutual information like the quantity 
given there, and a conjecture is made on its connection with quantum channel capacity, 
explored in more detail [jl|. The connection with channel capacity was also analyzed 
13] . In [17] its connection with entanglement is discussed. In yet another approach 



m 



[11 1 the starting point is one density matrix on a tensor product. The conditioning is 
then obtained by looking at the two density matrices in the two sub-systems resulting 
by taking the corresponding partial traces. 

In this article we will propose a different candidate for a quantum mechanical condi- 
tional entropy S(p\a) > 0, a function of two density matrices p and a in a same Hilbert 
space and having the interpretation of the entropy of p conditioned by the "knowledge" 
given by a. For simplicity we will only discuss the finite dimensional case although an 
extension to the infinite dimensional case seems possible. If we view p as the analogue 
of X and a the analogue of Y such that von Neumann's entropy S(p) is the analogue 
of Shannon's entropy H(X), then this conditional entropy shares several but not all 
properties of Shannon's conditional entropy H{X\Y) (see section 3 for a brief recapit- 
ulation of Shannon's theory). In particular the "knowledge" of a reduces the entropy, 
i.e. the inequality S(p\a) < S(p) holds. This corresponds exactly to Shannon's famous 
inequality H{X\Y) < H(X) and was our main motivation for our construction. Also 
and again in analogy to the classical theory we wanted the conditioning to be given by 
a quantity on the same footing as the original density matrix, i.e. conditioning should 
also be given by a density matrix. If as in the classical case a contains no information, 
i.e. if it is a multiple of the identity such that S(a) is maximal, then S(p\a) = S(p). 
In contrast to the classical case H(X\X) = 0, however, the relation S(p\p) = holds if 
and only if the non-zero eigenvalues of p are non-degenerate. In particular S(p\p) = 
if p is pure. We will not elaborate on the question, whether the failure of our S(p\a) to 
satisfy all corresponding classical properties, like this last property, is due to a funda- 
mental difference of quantum and classical information theory. In particular we will not 
provide a more detailed quantum mechanical interpretation of S(p\a). Also so far we 
have not analyzed whether it may be used in the context of channel capacity. Rather we 
will argue that other quantum mechanical versions of conditional entropy, which share 
more properties with the classical counterpart H(X\Y), do not exist. 

The article is organized as follows. In section 2 we provide the construction of a 
quantum version S(p\a) of the conditional entropy and establish several properties. In 
section 3 and after a brief review of Shannon's theory we compare this with Shannon's 
conditional entropy. In section 4 we first present a list of desirable properties for a 
quantum version of conditional entropy given in terms of two density matrices. We 
then show that even parts of these desiderata can not be fulfilled simultaneously. In 
particular there is no version involving two density matrices and which reduces to the 
classical case, when these two density matrices commute. We will provide an alternative 
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in terms of resolutions of the unit matrix in terms of orthogonal projections and which 
share more properties with the classical case. Briefly we will compare this ansatz with 
the algebraic constructions given by Connes and St0rmer and Connes, Narnhofer and 
Thirring. 



2 Construction of a quantum conditional entropy 

Let p be a density matrix on a finite dimensional Hilbert space Tl, i.e p > and 
Tip = 1, where Tr denotes the canonical trace on 7i. We write p = ^^Pi Pi for the 
spectral representation of p where the projections Pi ^ are pairwise orthogonal ( i.e. 
PiPj = 5ijPi, Pi = Pj), such that pi > 0, pi ^ pj for i ^ j and Y^i Pi = ^> where I is the 
identity operator on 7i. Thus Tr p = 53 • dim Pj pi = 1 with dim P = Tr P = dim PTL for 
any projection P. Here and in what follows projection operators are always understood 
to be orthogonal. With this notational convention the Pi are canonically defined in 
terms of p. Since this fact will be crucial in what follows, let us briefly recall a standard 
proof. The eigenvalues pi (and their degeneracies (= dim Pi)) are of course uniquely 
determined by p as solutions in A of the secular equation det(AI — p) = 0, a basis 
independent relation, such that det(AI— p) = Y\i(^~ Pi) dim Pi ■ Order the pi in such a way 
that 1 > pi > p 2 > p 3 > - ■ Then Pi = lim n ^ 00 (p/pi) n , P 2 = lim n ^ 00 ((p-piP L )/p 2 ) n , 
etc. 

The quantum mechanical entropy of p is given as S(p) = — dim Pj pi In pi, which 
is continuous and concave in p (for an account of sub-additivity and convexity properties 
of the entropy and related quantities see e.g. 18]). Let a be another density 



matrix on the same space 7i with the spectral representation a = Y- (JjQj again written 
in a canonical way. We define the conditional entropy by 

s (pW) = ^2 dim Qj aj F(p,Qj) 
j 

= ^TrQj-cr F(p, Qj) (1) 
j 

where 



F(p, Q) = - Tv(QpQ HQpQ)) + Ti{QpQ) In Ti{QpQ) (2) 

for any orthogonal projection Q. Since the Qj's and (ij's are well defined in terms of a 
and since trivially QpQ > 0, S(p\a) is well defined. Also as usual in this context A In A 
for any non-negative operator A is defined in terms of the spectral representation of A 
with the natural convention that rclnxl-^o = 0. If QpQ ^ then also ^ Tr QpQ = 
TrQp and then we may write 

F(p,Q) = Tr Qp-S(p Q ) (3) 

with 

« = W W (4> 

being a density matrix. Actually we might use (|3|) instead of @ as a definition for 
F(p,Q) with the convention, usually made in similar contexts (see e.g. pi|| ), that 
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times something undefined is 0. Relation (|3|) shows that F(p, Q) > for all p and Q. 
Using (g) we may rewrite S(p\a) as 

S(p\a)= Yl Tr Qj a Tr Qj p S(pq j ) . (5) 

There is yet another way of writing F(p, Q). It uses the relative entropy < S re i(A, B) = 
Tr ^4(ln A — In B) < oo, which is defined for any A > and B > 0. The relative entropy 
is lower semi-continuous in A and jointly convex in A and B, see e.g. pLpfl. Obviously 
S re i(X A, X B) = X S re [(A, B) holds for any A > and we have 

F(p, Q) = -S rel (Q p Q, Tr(Q p Q)I) (6) 

such that 

S(pW) = - E Tr Qi a ' S rel(QjPQj, Tr(QiPQi)I) (7) 

i 

It is instructive to compare S^pjo") with S(Eq(p)) and which actually motivated our 
construction of S(p\a). Eq is the linear map on the set of linear operators A on TL 
given as Eq(A) = ]T\ QjAQj . The Qj's are as above, i.e a any set Q = {Qj} of 
pairwise orthogonal nonzero projection operators with ^ • = I and which is called 
a resolution of the identity. Eq is a conditional expectation (see e.g. ||) with range 
being the ^-algebra consisting of all linear operators which commute with all Qi. In 
particular Eq maps density matrices into density matrices. More precisely, let B = B(TC) 
be the ^-algebra of all linear operators on Tl, which is (isomorhic to) a full matrix- 
algebra. Then Eq (B) is a *-sub-algebra of B and the direct sum of the *-sub-algebras 
QjBQj = B(Qj7i), which are (isomorphic to) full matrix algebras. Although any finite 
dimensional ^-algebra is (isomorphic to) a direct sum of full matrix algebras, not all 
*-sub-algebras of B are of the form Eq{B) for a suitable Q. As an example consider the 
algebra generated by I alone. It can easily be shown that any *-sub-algebra is of this 
form if and only if it contains a maximal abelian sub-algebra. Also from Eq{B) Q may 
be recovered. Indeed the Qfs are just the minimal self-adjoint idempotents (i.e. the 
orthogonal projections) in Eq{B) and which are central. Also on the set of all spectral 
resolutions of the identity we introduce a partial ordering < by setting P < Q if to each 
i there is (which is unique) such that P, < QjU)- Note that each j is of the form 
j = for at least one i. Then in particular all Pi commute with all Qj. Also P < {1} 
holds for all P. It is easy to see that P < Q if and only if Ep(B) C E Q (B). With 
respect to these orderings P or equivalently Ep{B) is minimal if and only if each Pj is 
one-dimensional. Ep{B) is then commutative with dimension equal to d\m.TL. To sum 
up, with respect to the partial ordering < there is a unique maximal element but there 
are many minimal elements in the set of spectral resolutions P. 



Now one has the well known result S(Eq(p)) > S(p) (see e.g. [18] for a direct 
proof and |26fl for the special case when dim Qj = 1 for all j. It is a special case of 
Uhlmann's monotonicity theorem p5f| , see also |n|). It means that projective measure- 
ments increase entropy and compares with the inequality S(p\a) < S(p) to be proven 
below. Its interpretation is that of a projective measurement described by the family 
Q of projections on a system given by p, but where we never learn of the result of the 
measurement. In contrast S(p\a) is interpreted as a set of projective measurements 
given by the projections Qj, each performed with the probability dim Qj aj, and where 
we learn of each outcome F(p,Qj) separately. The sum in (|) and H then reflects 
the occurrence of a quantum decoherence. In other words one considers the family of 
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density operators pq 3 , QjpQj ^ 0, takes their von Neumannn entropy and then forms 
the linear combination with the non-negative coefficients Tr Qja TrQjp. 
By definition we have 

F(p, Q = T) = S(p\a = (1/dimI) • I) = S(p). (8) 

We consider this property to be necessary for any other sensible definition of a condi- 
tional entropy involving two density matrices. It holds for Shannon's conditional entropy 
H(X\Y) in the form H(X\Y) = H(X) when Y is the trivial partition (see section 3), 
which means that there is no gain in information, if Y contains no information. We will 
return to this point in section 3. 

Some additional remarks are in order. Since the quantity S{p\a) is supposed to be a 
quantum mechanical mechanical analogue of Shannon's conditional entropy H(X,Y), p 
corresponds to X and a to Y . In analogy to the classical case, where X and Y may be 
considered to be stochastic variables living on the same space, here the density matrices 
p and a also live on the same space. Unfortunately with this correspondence S(p\a) 
does not reduce to the classical case when p and a commute (see (^) and its discussion 
in section 3). As matter of fact, we shall argue in section 4 that a quantum conditional 
entropy with this property does not exist. 

By construction we have the obvious invariance under unitary automorphisms 

F(p,Q) = F(UpU-\UQU- 1 ), (9) 

for any U G l^CH), the group of unitary operators in TL. This relation @ immediately 
implies 

S{UpU~ 1 \UaU- l ) = S(p\a) (10) 

for all U . Relation ( |l0| ) reflects the fact that S(p\a) is defined intrinsically and is in 
particular basis independent. Therefore this invariance property should also hold for 
any alternative, sensible definition of a quantum mechanical conditional entropy defined 
in terms of two density matrices. We shall comment on the classical analogue to (|jl]) in 
section 4. 

The next observation is also important. It is easy to see that F(p, Q) is continuous 
in p and Q by the same arguments used to prove continuity of S(p). Therefore S(p\a) 
is also continuous in p for fixed a. However, S(p\a) is not continuous in a everywhere 
for all fixed p. It is continuous on the dense open subset where the eigenvalues of a 
are non-degenerate. In fact, it is zero there(see below). So this lack of continuity occurs 
where a has degenerate eigenvalues and is due to the fact that for Q = Q' + Q" being the 
sum of two projections both ^ and which are orthogonal to each other, i.e. Q'Q" = 0, 
in general one has 

dim Q F(p, Q) dim Q' F(p, Q') + dim Q" F(p, Q"). (11) 

To understand this consider the case when dim 7i = 2. Then S(p\a) = S(p) if a = 1/2 1 
and S(p\a) = otherwise. At the moment we do not know whether this lack of continuity 
of S(p\a) in a is a desirable feature or not, i.e whether this can be understood quantum 
mechanically, when we interpret S(p\a) as the entropy of p conditioned by a. Observe 
that a degeneracy typically occurs when a non-trivial symmetry is present. In other 
words there is then a non-trivial non-abelian subgroup Q = G(cr) of IA{TL) such that 
\JaU~ 1 = a for all U 6 Q. Note that Q always contains a subgroup isomorphic to the 
abelian group U(N = dim TL). In this picture a removal of degeneracies is related to a 
breakdown of symmetry, a familiar phenomenon in physics. 
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To proceed further, F(p, Q) = if QpQ = 0, which can happen for Q ^ only if 
p has zero as an eigenvalue, i.e. if p is not strictly positive. Then also (I — Q)pQ = 
Qp(I — Q) = 0. In fact, by Schwarz inequality for any i/j, tp' E H we have 

| < if>,Qp(I-Q)i// > | < 11/^111^1/2(1 = 0. 

This also shows that QpQ = is equivalent to Qp = 0, which in turn by the self- 
adjointness of p and Q is equivalent to pQ = 0. By the trivial identity 

P = QpQ + (I - Q)pQ + Qp(I -Q) + (l- Q)p(I - Q), (12) 

valid for all p, Q, we therefore also have p = (I — Q)p(I — Q) whenever QpQ = 0. 
Obviously flU) gives Tr p = Tr QpQ + Tr(I — Q)p(I — Q) such that in particular the 
inequalities < Tr QpQ < 1 and < Tr(I - Q)p(I - Q) < 1 hold for any p and Q. By 
relation (||) we also have F(p,Q) > and hence S(p\a) > for all p, Q and a. Now 
S(pq) = 0,QpQ 7^ holds if and only if pq is a pure state, i.e. a one-dimensional 
projection. Also for dimQ = 1 one always has QpQ = (Tr QpQ)Q. We collect this 
observation in 

Lemma 2.1. F(p,Q) = if and only if QpQ is a multiple of a one- dimensional pro- 
jection. 

This multiple is allowed to be zero. To characterize such Q's fulfilling the conditions 
of the lemma, let P(p) ^ be the projection operator onto the subspace corresponding 
to the non-zero eigenvalues, such that P{p)p = p = pP(p) and in particular P{p) = I 
if p > 0. Using the spectral representation of p it is easy to see that QpQ is a multiple 
(possibly zero) of a one-dimensional projection if and only if Q may be written as 
Q = Q' + Q" with dimQ' < 1 and P{p)Q" = pQ" = 0. 

More generally consider the case where QpQ = (Tr(QpQ)/ dimQ') • Q' , holds 
for a suitable projection operator Q' such that in particular ^ Q' < Q and Q' is unique 
whenever QpQ ^ 0. Then F(p,Q) = (TiQpQ)ln dim Q' and p Q = (1/ dim Q')Q' . This 
gives the 

Lemma 2.2. If all non-zero eigenvalues of a are non- degenerate then S(p\a) = for 
all p. More generally ifQjpQj is a multiple (possibly zero) of some projection operator 
Q'j (< Qj) for all j with o~j > 0, then 

S(p\a) = ^ Tr Qjp Tr Q j a\n6hn.Q' j . (13) 

i 

Observe that S(p\cr) = for all pure states a and all p. If p is pure then QpQ is 
always a multiple of a pure state for all Q. Therefore S(p\a) = also holds for all a 
whenever p is pure. Also if pa = which is equivalent to Tr pa = and which can 
happen only if neither p nor a is strictly positive, then again S(p\a) = 0. Sufficient (but 
not necessary) for the condition of Lemma to hold is that to each j with aj > 
there is i(j) with Qj < Pi(jy For these j's Q'j = Qj, QjpQj = p^Qj and hence 
TrQjp = p^} dimQj. This gives in particular 

S(p\p) =^p 2 (dimPi) 2 mdimP i . (14) 

i 

Therefore the relation S(p\p) = holds if and only if all the non-zero eigenvalues of p 



are non-degenerate, the if part being a special case of Lemma 2.2 
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If in addition to the property Qj < PiU) the density matrix a is such that 
Oj (dim Qj ) 2 In dim < (dim Pi ) 2 In dim Pj 

j:i(j)=i 



holds for all i, then by (13) and (14]) SX/ ! ") < ^(z !/ 9 )- Note that this last condition is 
satisfied if 

<jj dimQj < /Oj dimPj 

j:i(j)=i 

holds since trivially dim Qj < dim Pi(jy 

We return to a discussion of the general properties of F(p, Q) and S(p\a). The first 
main result of this article shows that S(p\a) shares an important property with S(p) (see 



e.g. [12, ^y] for the classical and the quantum entropy and [14] for Shannon's conditional 



entropy and derived quantities). 

Theorem 2.1. F(p,Q) and S(p\a) are both concave in p. 

Again we consider this property to be necessary for any sensible definition of a 
quantum conditional entropy. Like for the entropy S(p) itself it states that mixing (in 
p) increases (conditional) entropy. On the other hand the case dimTi = 2 discussed 
above shows that in general S(p\a) for fixed p is neither convex nor concave in a. 
Intuitively it would be desirable to have concavity with respect to a since mixing the 
conditioning should increase conditional entropy. 

The proof follows easily from the presentation (^) and (||) and the known convexity 
property of the relative entropy. 

The second main result of this article shows in particular that S(p\o~) satisfies Shan- 
non's inequality. 

Theorem 2.2. The following inequalities hold for all density matrices p and a in a 
fixed finite dimensional Hilbert space 

< S(p\a) < S(p). (15) 

If p > the last inequality is strict unless a = (1/ dim I) • I. 

The above comparison of S(p\a) with S(Eq(p) suggests another definition of condi- 
tional entropy with the conditioning not given in terms of a density matrix a but rather 
only in terms of any resolution Q of the identity. 



s m=E^f FMi) - (16) 

3 

By (FT?]) below we have 



< S(p\Q) < S(p), 

where the first inequality is an equality if dim Qj = 1 for all j and the second one an 
equality if the spectral resolution is trivial, i.e. if Q = {I}. We note that in (16) any 



sequence of numbers a'j > ( labeled in the same way as the Qj's) with ^ ■ a'j = 1 and 
replacing dim Qj/ dim I would do equally well. But then we may combine and encode 
these data Q and {a} in the density matrix a = Y^j a j Qj with Oj = a'j/ dim Qj. If 
in addition all the cij's are pairwise different, then by our discussion above they and 



7 



the spectral resolution Q may be recovered from o and we are back to our construction 
S(p\a). 

Due to the relation 1 = Tr o = ^ • dim Qj Oj this second theorem is an immediate 
consequence of the following 

Lemma 2.3. For all p and Q the inequality 

F(p,Q)<S(p) (17) 
holds. If p > this inequality is strict unless Q = I. 

Before we turn to a proof we make some remarks. We conjecture that in the general 
case p > 0, the inequality ([H]) is strict unless Qp = p. This would imply that the second 
inequality in (|l5|) is strict unless ap = (Tr o p)p, which means the following. Any o with 
op = (Tr op) p is of the form o = (Tr op)P(p) + o' with (I — P(p))o' = o' . 

Instead of F(p,Q) one might be tempted to consider instead the quantity (see @) 

F(p,Q) = - Tr (QpQ In QpQ) > 

and try to prove F(p,Q) < S(p). Obviously we have F(p,Q) > F(p,Q). Consider, 
however, the case where dimQ = 1 and p = P, dimP = 1 (i.e. p is pure) and with P 
chosen such that QPQ = (Tr QP)Q satisfies < Tr PQ < 1. Then = S(p = P) < 
F(p = P,Q). Furthermore one has F(p,Q) < S(pq) when < Tr QpQ(< 1). But it 
does not make sense to replace F(p, Q) by S(pq) as an alternative, since S(pq) is only 
defined when QpQ ^ 0. Even if QpQ ^ 0, one does not have S(pq) < S(p) in general. 
To see this we will consider an example. For any ^ ip G TC let P^ be the 1-dim. 
projection onto the subspace spanned by ip. 

Example 2.1. Let dim TL = 4 with ipi , ip2 > "03 > "04 being an orthonormal basis. Let Q = 
P^j + P^ 2 be the 2-dim. projection onto the sub-space spanned by ipi and 02- Choose 
p(<j>i, 4>2) = P\P^' X + P2P^ 2 , Pi + P2 = 1 with 

V4 = cos 4>\ ipi + sin 0i V>3 , 

-0 2 = cos <p2 ^2 + sin 02 1p4, COS 01 7^ 7^ COS 02 ■ 

T/ien 

, > COS 2 01 pi COS 2 02 p 2 D 

P(<P1,<P2)Q = 2~A 1 2~k H 2~I ! 2~I M*2" 

COS Z 01 pi + COS Z 02 P2 COS Z 01 pi + COS^ 02 P2 

Assume < p\ < 1 smc/i £/iai 5(p(0i, 02)) 7^ an<i choose 0i anc? 02 swc/i i/ia£ 
cos 2 4>ipi = cos 2 2/ O2- Tfe's ai?;es p(0i,02)q = 1/2 Q ™^ S(p(4>\, 02)q) = In 2 > 
S(p(cj)i, 02)) whenever pi ^ 1/2. On £/ie oi/ier /land, some easy estimates show that 
indeed F(p(0i, 02), Q) < S(p(<pi, 02)) aoWs /or a// 0i and 02- 

This example also shows that in general neither pq nor S(pq) for QpQ = may be 
defined by a limiting procedure. In fact, we may let 0i and 02 tend to tt/2 in such a way 
that cos 2 02/ cos 2 0i tends to an arbitrary constant > showing that in the limit for 
p(0i,02)q we may obtain an arbitrary convex combination of P^ x and P^ 2 and hence 
an arbitrary value between and In 2 for the entropy. By the convexity of the relative 
entropy we also have 

F(p, Q) + F(p,I- Q) < S(E m _ Q} (p)). 

On the other hand, in general F(p, Q) + F(p, I — Q) is in general not bounded above by 
S(p). Indeed, consider the following 
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Example 2.2. Let the set-up be as in Example 2.1. With repsect to this basis let 



p(k) 



( 1 k \ 

1 K 

K 1 

V k o o i / 



with < K < 1. T/ie too two-fold degenerate eigenvalues are 1/4(1 ± «;). 27ns </wes 
F(p,Q)+F(/9,I-Q) = In 2 w/iereas 5(/9(/c)) = In 2- 1/2((1 + /e) ln(l + k) + (1 - k) ln(l - 
«)) < In 2, whenever < k. 



The quantity 



A5(p) = S(p) - S(p\p) > -^dimPj pjln(dimPj p { 



(18) 



is of special interest. The inequality is a consequence of dimPj pi < 1 and again implies 
that the right hand side is non- negative and equal to zero if and only if p = (1/ dim I) • I 
such that AS , (/)) > unless p = (1/dim 1)1. In more detail the inequality in ( |l8| ) may 
also be written as follows. Let S c i(p) > be the classical entropy for the probability 
distribution p = (pi,p 2 , -Pn),Pk >0,^2kPk = 1 



Sd{p) = -^Pk In p ft 



k=l 



such that in particular 



with 



S(p) = S cl (p(p)) 



p(p) = (p!, „, px,P2,~, P2, ••••)• 
dim Pi dim P2 



(19) 



(20) 



(18) may now be rewritten as 



< S d (p(j>)) < AS(p) 



(21) 



with 



p(p) = (dimPi pi,dimP 2 P2,—) 

and where S c i(p) = if and only if p is a pure state. AS(p) is easily shown to be 
continuous in p and is obviously bounded above by In dim I = In dim TC = S(p = 
(1/ dim I) -I). It would be interesting to find its maximum in p for fixed dimension of 
dim TL. Note also that 

S{p) = S d (p(p)) = S cl (p(p)) + J] dim Pi o-dndimP > S cl (p{p)) (22) 

i 

with equality if and only if dimPj = 1 for all i with <7j > 0. We will discuss AS(p) below 
when we compare S(p\o~) with Shannon's conditional entropy. 

We turn to the proof of (|l7l). First recall that F(p,Q) is continuous in p (and Q). 
Hence it suffices to consider the case p > which implies that QpQ ^ for all Q ^ 0. 
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Since F(p,Q) = for Q = and dimQ = 1 and since F(p,T) = S(p) it suffices to 
consider the case 1 < dimQ < dim I. 

Now IA{TL) operates transitively and continuously on the Grassmannian of all n- 
dimensional subspaces of H., (1 < n < dim7Y). For each n this space is therefore 
compact and homeomorphic to the set of all projections of dimension n. Obviously on 
this set U(H) operates, again continuously, via U : P i— > UPU^ 1 . By (||) 

F n (p)= sup F(p,Q)= sup F(p, UQoU- 1 ) = sup F{U P U' l ,Q Q ), (23) 

Q:dimQ=n U:U&A{H) U:U£U(H) 

which is finite for each n. Here Qo is any orthogonal projection with dimQo = n - I n 
particular we may choose Qq such that F n (p) = F(p,Qo). Consider the one-parameter 
unitary subgroup U(t) = exp(-itK), where K is an arbitrary self-adjoint operator on 
H. Then we must have f K (t) = F(U(t)pU(-t),Q ) < F(p,Q ) = f K (t = 0) for all 
t and all s.a. K. Now it is well known that for any one parameter family of strictly 
positive operators A(t) which is differentiable in t one has 

| Tr(A(t) In A{t)) = Tr((I + In A{t))±A(t)). 

Recalling the assumption p > such that QopQo > when restricted to the subspace 
Qo?i, it is easy to see that fxit) is also differentiable in t at t = and 

ifK(t)\t=o = -i Tr ((I + In Q pQ )Q [K,p}Q ) 

+i (l + lnTr(Q pQo))TrQo[^,/o]Qo (24) 
= i Tr([p, In Tv(Q pQo) • Q - Q Qn Q pQo)Qo]K). 

By definition of Qq we must have d/dtfx{t = 0) = for all K. But then (p4|) implies 
that p commutes with B = QqB = BQq given as 

B = lnTr(Q /oQo) ■ Qo ~ QoQnQopQo)Qo. 

This in turn implies that p commutes with Qq itself, which is easy to see. Indeed, use the 
spectral representation QopQo = YlkPkQ'k wrtn Q'k — Qo,dimQ^, = 1 and J2kQ'k = Qo 
to write B as 

a = E(k(Ed)-w*x?*- 

k i 

Now write any ip € QqH as ip = flfcV'fci where ip^ is a unit vector in Q' k Ti. Set 

V ( ln (EiPi) - ln Pfe) 

is well defined since ^ p\ ^ p' k for every k. This follows from our assumption n > 1, 
the fact that In a; is strictly monotonic in x and that pi > for all k, since QopQo when 
restricted to QqH is strictly positive. By construction ip = B<j) such that pip = pB<p = 
Bpcp = QoBpcp € QqH- Thus p leaves Qo^ invariant and hence commutes with Qq, as 
was claimed. But then we have p = QopQo + (I — Qo)p(I — Qo) which implies 

S(p) = - Tr(Q pQo In QopQo) ~ Tr((I - Q )p(I - Q ) ln(I - Q )p(I - Qo)). 

This gives 

S(p) =F(p,Q ) -Tv((I-Qo) P (I-Qo)Hl-Qo)p(I-Qo)) 

-TrQopQolnTrQopQo- (25) 
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The two last terms in (|25|), however, are non-negative. This concludes the proof of the 



claim (|1TD . To prove the second part of Lemma 2.3, we observe that the last two terms 
in (pq) vanish exactly when (I — Qo)p(I — Qq) = 0. But this contradicts the assumption 
p > and dimQo < dim I, the case Q = I having been discussed previously. This 
completes the proof of Lemma [2.3|. 



3 Comparison with the classical case 



In this section we provide a comparison with the classical theory of Shannon (see [22] and 
for expositions e.g. (7, 14, ^4|). For the convenience of the reader and in order to establish 
notation we recall the basic facts. Let {tt, p,} be a probability space. Furthermore let 
X = {X a } and Y = {Yp} be any two partitions (up to measure zero) of into disjoint 
subsets of non-zero measure. For simplicity we will assume these partitions to be finite, 
i.e. we choose the indices a and f3 to be in the range 1 < a < n, 1 < < m. Set 
p(X) = {p a } with p a = p(X a ) > and p(Y) = {qp} with qp = piYp) > such that 
^2 a p a = 1 and YlpQp = Here and in what follows a is an index referring to X and 
(3 to Y. Then H(X) = — ^2 a p a ^Pa > and similarly H(Y) = — Q/3 m Q/3 > is 
Shannon's entropy. Actually Shannon used log 2 instead of In adapting to the situation 
where information is coded in bits, but this is not relevant for our purpose. Since 
H(X) = S c i{p{X)) this concept of information theory relates to the concept of entropy 
in classical statistical mechanics. Shannon's conditional entropy is now given as follows. 
Let 

_ p(x a n Yp) _ p(Yp n x a ) 

be conditional probabilities associated to X and Y (i.e. p a \p is the probability that X a 
will happen, given that Yp has happened). Obviously 

PalpQp = Qf3\aPa(= K A a H Bp)) (26) 
for all a, /5, which is called Bayes rule for p a \p and qp\ a . Let p^ = {p\\piP2\pi ■■,Pn\/3) an< ^ 

Q a = (Ql\a,Q2\a,--Qm\a), such that 

m n 

(3=1 a=l 

Shannon's conditional entropy is now defined as 

m 

H{X\Y) = Y,QpS d (jp p ) (28) 

(3=1 

and it satisfies 

< H(X\Y) < H(X). (29) 

We observe that the second inequality, called Shannon's inequality, is a consequence of 
the concavity of the function p i— » S c i (p) and (^?j) . It states that on average information 
on X is gained if Y is known. Also < H(X, Y) = H(Y) + H(X\Y) is symmetric in X 
and Y and satisfies 

H(Y) < H(X, Y) < H{X) + H(Y). (30) 
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Actually H(X, Y) = H(X\/Y), where V denotes the join of two partitions. The inequal- 
ities in (29) and ( pOD turn into equalities if the following conditions hold. X and Y are 



said to be independent if p a ^ = p a holds for all a and j3. This means that p^ is actually 
independent of j3 and equals p and q a is independent of a and equals q. In particular 
SdiPg) = S c i(p) holds for all /3 and Sd(q ) = S c i(q) for all a. The second inequality 
in (^9|) and the second inequality in ( j30|) (which are equivalent) are now equalities if 
and only if X and Y are independent. It follows from the fact that S c i(p) is strictly 
concave in p. Secondly X is called a consequence of Y if to each a there is (3(a) such 
that Pcj^a) = 1. So this means that p a \p = for all f3 ^ (3(a) and hence S c i(p^) = for 



all j3. Therefore the first inequality in (|29| ) and equivalently the first inequality in (30) 
are equalities if and only if X is a consequence of Y. In particular 

H{X\X) = 0, (31) 

i.e. H(X,X) = H(X). 

With this brief review of Shannon's theory we turn to a comparison with our quantum 
mechanical construction. Obviously ( |29| ) corresponds to (|T5| ) when we let X correspond 
to p and Y to a. Note, however, the difference between (|3l| ) and (|l4|). Moreover for the 
quantity S(p,a) = S(a) + S(p\a) we have the inequalities 

S(a)<S(p,a)<S(p) + S(a), (32) 

which correspond to (|30[). S(p,a) is in general not symmetric in /9 and cr . To see this 
consider commuting p and <r. Then we have 

S (pW) = -^ dim Qj a jPi dim(P^Q j )ln ^^^y - ( 33 ) 

We remark that if Tv(pQj) = for a fixed j then Tr (PiQj) = for all i. Also (U) 
is a special case of (|33|). ( |33"| ) shows that even in the commutative case S(p,a) is not 
symmetric in p and <r. So this implies that in the commutative case S(p\a) does not 
reduce to H(X\Y) for any choice of X = X(p) and Y = Y{a) with H(X) = S{p) 
and H(Y) = S(a). This lack of symmetry of S(p,cr) is in contrast to the symmetry of 
its classical counterpart H(X,Y), which has an important interpretation. The relation 
H(X,Y) = H(Y,X) is equivalent to H(Y) + H(X\Y) = H(X)+H(Y\X), a consequence 
of Bayes rule. But this means that on average the information on Y plus the information 
on X given Y is equal to the information on X plus the information on Y given X. It 
would be interesting to see whether this failure of symmetry for S(p, a) has a sensible 
interpretation in the context of the familiar Alice and Bob set-up in quantum information 
theory, see e.g. pO| . 
Finally consider 

< S(p\ \a) = S(p) + S(a) - S(p, a) = S(p) - S(p\a) < S(p) (34) 

which corresponds to 

< I(X\\Y) = H(X) + H(Y) - H(X,Y) = H(X) - H{X\Y). 

On average < < H(X) gives the information gain on X when knowing Y. 

Thus if there is no information content at all in Y, i.e. if Y is the trivial partition 
then there is no information gain in X 

I(X\\Y = {Cl}) = 0. (35) 
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Thus (|35| ) corresponds to (g) when rewritten as S(p\\a = (1/ dim 1)1) = 0. Therefore we 
also interpret the quantum mechanical analogue S(p\\a) as a quantum information gain 
for p given a and which by (34) can be at most S(p). In particular the gain is maximal 
for all p, if all non-zero eigenvalues of a are non-degenerate. The gain is also maximal 
if pa = 0, since then S(p\a) = 0, see Lemma and the remark thereafter. 

Finally A5(/j) (see (|l8|)) corresponds to I(A||A) and describes the situation where 
p is conditioned on itself, a = p. Then by (^l|) there is non-zero information gain unless 
p is pure (and then a gain is not necessary). In contrast to the classical situation, 
I(X||X) = H(X), which gives complete information gain when X is conditioned on 
itself, there is complete information gain in the quantum case, AS(p) = S(p), if and 
only if all non-zero eigenvalues of a are non-degenerate. 



4 Attempts of alternative constructions 

We conclude by addressing the natural question whether there is a quantity S ? (p\a) 
which shares more properties with Shannon's conditional entropy than the S(p\a) we 
have given. More precisely and by the arguments given in the preceding sections it 
would be desirable for S ? (p\a) to have (most of) the following properties 

1. Invariance under the group U{H): S 1 {U pU^UaU- 1 ) = S : \p\a) for all U € U(H) 
(compare (|l~0|)). 

2. Bounds: < S'(p\a) < S(p) for all p and a with S 7 (p\p) = and 
S\p\(T = (1/dim 1)1) = S{p). 

3. Classical equivalence with Shannon's conditional entropy. 

4. Symmetry: 5 ? (/>, a) = S(a) + 5 1 (p\cr) is symmetric in p and a. 

5. Continuity of (p\a) in p and in a. 

6. Concavity of S ? (p\a) in p and a. 

Note that S(p\a) fulfills condition 1, condition 2 apart from the property S(p\p) = 0, 
condition 5 up to a set of measure zero and condition 6 only with respect to p. 

Both the equality requirements of condition 2 can never be satisfied simultaneously. 
Indeed, with the choice p = a = 1/ dirndl we should have both 5(1/ dimHI|l /dirnHI) 
= and 5(1/ dim HI\l/dimHl) = lndimH. Also the condition S(p\p) = combined 
with S(p\a) > is incompatible with concavity of S(p\a) in p (condition 6). In fact, let 
p = Xpi + (1 - A) pa, < A < 1. But this gives = S(p\p) > A S(p x \p) + (1 - A)5(p 2 |p). 
Hence S(p\\p) = for all pi for which there is A > with A p\ < p. This condition is 
fulfilled for all pi, whenever p > (I owe these observations to H. Narnhofer). 

Next let us look at the condition 3, by which we mean the situation where p and 
a commute such that S(p) = H{X), S(a) = H(Y) and S ? {p\a) = H{X\Y) holds for 
suitable X = X(p) and Y = Y(a). Also the dependence of X{p) and Y(a) on p and 
a respectively should be non-trivial w.r.t. their eigenvalues. In particular condition 
3 means that the symmetry condition 4 must hold at least when p and a commute. 
In view of the destruction of quantum coherence when measurements are performed 
and due to the occurrence of the sum by which S 7 (p,a) is defined, it is unclear to 
the author whether the symmetry condition 4 also should hold for non-commuting p 
and a (see below , however, a construction of conditional entropy in terms of spectral 
resolutions of the identity below). It is natural to make the assumption on X(p), that 
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/x(Xfc) = Pk(p), see (19). Then it may be shown that the continuity condition and the 
classical equivalence condition are not compatible. The concavity condition in a is at 
least intuitively desirable since taking convex combinations decreases conditioning, i.e. 
increases uncertainty, and hence should increase conditional entropy. 

We would also like to point out another difference between the classical and the 
quantum case in the way we have presented it so far. In the classical case the conditioning 
Y is trivial when Y = {£1}, which means no information content and for which we 
have H(Y) = 0. Within the context of density matrices the only sensible candidate 
for a trivial conditioning is a = 1/ dim 7^1, since this is the density matrix with no 
information content. Its von Neumann entropy, however, is maximal. Recall that we 
used this quantum notion of trivial conditioning in our discussion of the inequality 
S{p\o) < S(p) (see also the discussion following (|35|)). We note that several authors 
consider von Neumann's entropy not to be a good generalization of classical entropy 
(see e.g. [||, page 141). In fact, in classical theory finer partitions give rise to higher 
uncertainty and hence to larger classical entropy. This was the reason for the algebraic 
approach of Connes and St0rmer and of Connes, Narnhofer and Thirring, in which a 
classical finer partitioning corresponds to a larger algebra. In particular the larger the 
algebra, the larger the entropy and similarly the larger the conditioning algebra the 
larger the conditional entropy. 

We claim, however, that there is a way to reconcile this with von Neumann's entropy. 
Indeed, given a quantum system in the state p, the measurements one can perform 
without disturbing p are given by the observables (i.e. the self-adjoint operators) in 
A(p), which by definition is the *-sub-algebra of B consisting of all elements in B which 
commute with p. In particular A(p = 1/dimTtI) = B. In this sense again larger 
uncertainties correspond to larger algebras. In other words, the larger the entropy the 
more measurements on can perform without disturbing the system in the given state p. 
To be more precise, we introduce a partial ordering X on the set of all density matrices 
(which differs from the one introduced by Uhlmann, see e.g. |26|| ). By definition p ^ a (a 
is more mixed than p) , if and only if a) P < Q and b) Tr p Qj = Tr a Qj = aj Tr Qj holds 
for all j. It is easy to see that p < a and a <t implies p <t and that p ■< 1/ dimWI 
and p ■< p holds for all p. So whenever p <a then condition a) implies A(p) C A(a) and 
a) and b) combined imply S(p) < S(a) by the concavity of the von Neumann entropy. 
Note, however, that the correspondence between p and A(p) is not one-to-one. In fact, 
A(p) only depends on the spectral resolution of the identity P = P(p) associated to p 
and not on the eigenvalues p% of p. Indeed, one has A(p) = E P fp\(B), as one may easily 
verify. 

Returning to our discussion of conditions 1-6, there is a way out, however, if one 
considers spectral resolutions of the identity P instead of density matrices. It works as 
follows. First observe that the actual choice of the probability space {Q, p} for Shannon's 
theory is irrelevant. What is relevant are the the sets of non- negative numbers p = {p a }, 
q = {qp}, pV q = {p a \(3\ and q Vp = {<Z/3| Q } subject to the following conditions of which 
the last one is Bayes rule 

^2,Pa = = li ^2,Poi\P<lp =Pa, ^2Qf3\aPa = 9/3, P a \f3 % = 1p\aPa- (36) 

a p a 

Note that then 

rv "P m 



^<ip\<* = — J2 p <*\p q p = 1 - 
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We consider these conditions (|36|), which mean independence of a particular realization 
of partitions X and Y on a probability space, the classical analogue of the relation (|l0|). 
Setting p aj/ 3 = p a \pqp and qp j0l = qp\ a p a , Bayes rule gives p a $ = qp, a . We will therefore 
write H(X\Y) = H(p\q) by a slight abuse of notation since all the data p, q, p V q and 
q V q in J36| ) are necessary for a specification of H(X\Y). But given these data it makes 
sense to say that p is a consequence of q or that p and q are independent. 

Now let r = 1/ dim 7i.Tr denote the normalized trace, i.e. r(I) = 1. For any two 
spectral resolutions P and Q let pi = r(Pj), qj = t(Qj), p^j = T{PiQj)/r{Qj), q^u = 
T(QjPi)/T(Pi). Note that by definition all Pi and all Qj are non-zero projections. The 
conditions ([36]) are obviously satisfied. We then set H(P_) = S c i(p), H(Q) = S c i(q) ,such 
that H(Q = {I}) = In dim Ti and finally H{P\Q) = H(p\q), such thaTo < H(P\Q) < 
H(P_) as desired. Note that now H(P]Q) is completely specified by P and Q. Also 
H(P,Q) = H(Q) + H(P\Q) is symmetric" in P and Q. ~ 

It is easy to see that p is a consequence of q if and only if Q < P such that H(P\Q) = 
if and only if Q < P. Similarly p and q are independent if and only if P = {1} or Q = {I}. 
Therefore, whenever H(P_) ^ 0, H(P_\Q) = H(P_) if and only if Q = {I}, which in this 
context is the trivial conditioning and for which the entropy is zero in contrast to our 
construction in terms of density matrices. Finally we set AdUQ = {U Qi £7 -1 } for any 
Q and any unitary U. Then obviously H(AdUP\AdUQ) = H(P\Q) (compare condition 

!)• 

Since the Pj's and the Q^s need not commute, this construction is a non-commutative 
version of Shannon's conditional entropy in (commutative) classical probability theory. 
Thus a classical partition X is replaced by a spectral resolution of the identity P, which 
in turn corresponds to the ^-algebra £"p(S) and which is abelian if and only if each Pi 
is one-dimensional. The choice Q = {1} giving maximal entropy H{Q) and maximal 
conditional entropy H(P\Q) corresponds to the maximal algebra Eq = ^{B) = B. Our 
construction of H(P\Q) differs from the construction in H, |5[. 

We might have defined the conditional entropy of two density matrices p and a by 
H \P_{p)\Q (c)) . Conditions 1,2 and 4 are then satisfied but not condition 5 and condition 
3, since the dependence on the eigenvalues of p and a drops out. We conjecture that 
condition 6 is also not satisfied. 

Acknowledgements: The author would like to thank M. Karowski, H. Narnhofer, 
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